Recently, exploring and analysing large network structure data has sparked more interest in the health field, particularly within emergency medical services (EMS) and ambulance systems. This increase is largely due to an advancement in technology, such as the Global Positioning System (GPS), mobile devices, and remote sensing, which have significantly contribute to the precision and volume of spatio-temporal data (S. Wang, Cao, and Philip 2020). This type of data captures information on both spatial and temporal context. The spatial component describes the location or spatial geometry, while the temporal component records time information such as a timestamp or time interval (Rao, Govardhan, and Rao 2012).
The emergency medical services and ambulance systems have an important role in ensuring that patient transfers are performed effectively and in a timely manner. This is especially important for an older population, where the delay could lead to an increase in health risk (Harmsen et al. 2015). Older individuals often require continuous support, including 24-hour care, assistance with daily tasks, and medical supervision. Thus, many reside in the residential aged care facilities (RACFs), which are specifically designed to provide this comprehensive care (Kearney and Winterbottom 2006). RACFs frequently rely on the ambulance services to facilitate the transfer of an individual to the hospital. It can be an acute emergency or planned/scheduled medical appointments. The increase in demand for these transfers is driven by population ageing (Harris and Sharma 2018). It put incredible pressure on the ambulance resource and highlights the need for efficient planning and utilisation of the service.
To gain insight into the transfer patterns, the data exploration, along with network representations linking RACFs and hospitals, provides a powerful framework. Network-based representations of the data are naturally perceived as relational, and people often associate the flow and connections with it. However, most network research tends to be conducted homogeneously, focusing primarily on the network topological properties while overlooking other important pieces of information, for example, the association between variables. While network representation is a powerful tool, especially for the transfers data, overemphasising network typology can lead to the neglect of the fundamental principles of data exploration. Which encourages a free investigation of the data to uncover patterns and unexpected results. Simple informative analyses, such as examining variable distribution, temporal trends, or even bivariate relationships, can reveal a lot of insights into transfer frequency, efficiency, and demand that are often masked by the network structure. Integrating data exploration with network-based approaches, therefore, allows for a better understanding of the RACF and hospital transfer.
Studying how infectious diseases spread throughout the network (transfer between RACFs and hospitals) is important because the older population tend to face a higher risk of mortality during the outbreaks (Parohan et al. 2020). These patient transfers between facilities create ways for the disease to be transmitted across the systems, leading to rapid spread. Traditional compartmental infectious disease models assuming homogeneous or static structure do not adequately capture networks that change over time. In reality, ambulance transfers are highly dynamic, where these connections between facilities can change in response to the demand, constraints, and even outbreak conditions. Understanding these transmission dynamics is therefore crucial for devising effective policies for stopping the spread of the disease. Combining a data exploration framework, which helps understand how the transfer network changes over time, with dynamic modelling that assesses how these changes affect the disease spread, is critical for identifying high-risk facilities, transfer connections, and periods.
Project 1a: A Multivariate Spatio-Temporal Network Data Exploration Framework
As multivariate spatio-temporal network data become more accessible and complex, understanding their structure and dynamics is key to effective decision-making. A major challenge with the analysis of large multivariate networks is the amount of information they contain, most of which is overlooked. By integrating exploratory data analysis (EDA) with network-based representations, the framework aims to support an examination of associations between variables, temporal changes, and structural differences within the network. Throughout this framework, a seamless integration of the following key processes is needed: data storage, cleaning, subsetting, visualisation, and visual inference. The following section, therefore, reviews existing tools that support these processes and discusses their limitations.
Data Storage and Cleaning
Data cleaning is the first stage of a reliable analysis. Spatio-temporal data usually need to be checked for inconsistency of the temporal records, duplicated records, and spatial inaccuracies. Now, adding the network structure on top of that, such as nodes, edges, and their attributes, requires the network topology to be kept throughout the process. Typically, this stage involves tools such as dplyr(Wickham et al. 2023) for manipulating the data, tsibble(E. Wang, Cook, and Hyndman 2020) for validating the temporal inconsistency, sf(Pebesma 2018) for checking the coordinate inaccuracies, and igraph/networkButts (2008) for keeping the network structure.
The tidygraph(Pedersen 2024b) package provides a tidy API for graph and network manipulation, where network data are thought of as two tidy tables, one for node and one for edge data. In tidy data (Wickham (2014)), each variable has its own column, each observation has its own row, and each value has its own cell. These tables are then stored together within a tbl_graph object, which preserves the underlying network topology while allowing standard dplyr verbs to be applied. The interaction between node and edge tables is done through the use of a special function, activate(), which allows the user to switch between the two tables and apply dplyr operations such as mutate(), group_by(), and join operations.
There are two main functions for creating tbl_graph object, as_tbl_graph() and tbl_graph(). The first function as_tbl_graph() takes in a different class of objects, such as data.frame, igraph, and network, then turns it into a tbl_graph object. While tbl_graph() takes in two data.frame objects, one for node and one for edge.
as_tbl_graph(edges)
# A tbl_graph: 815 nodes and 4692 edges
#
# A directed acyclic simple graph with 2 components
#
# Node Data: 815 × 1 (active)
name
<chr>
1 1 ABERDEEN STREET RESERVOIR
2 1 ADENEY STREET CAMPERDOWN
3 1 AITKEN AVENUE DONALD
4 1 CARINYA CRESCENT KORUMBURRA
5 1 CHESTNUT ROAD DOVETON
6 1 CHIVERS ROAD TEMPLESTOWE
7 1 CLAYTON ROAD BALWYN
8 1 COBB ROAD MOUNT ELIZA
9 1 ELEANOR STREET HEYFIELD
10 1 FOLEY STREET TERANG
# ℹ 805 more rows
#
# Edge Data: 4,692 × 8
from to weight long_hosp lat_hosp long_racf lat_racf category
<int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 628 1 145. -37.8 145. -37.7 RACF
2 1 629 218 145. -37.8 145. -37.7 RACF
3 1 630 2 145. -37.8 145. -37.7 RACF
# ℹ 4,689 more rows
tbl_graph(nodes, edges)
# A tbl_graph: 815 nodes and 4692 edges
#
# A directed acyclic simple graph with 2 components
#
# Node Data: 815 × 4 (active)
name longitude latitude type
<chr> <dbl> <dbl> <chr>
1 1 ABERDEEN STREET RESERVOIR 145. -37.7 racf
2 1 ADENEY STREET CAMPERDOWN 143. -38.2 racf
3 1 AITKEN AVENUE DONALD 143. -36.4 racf
4 1 CARINYA CRESCENT KORUMBURRA 146. -38.4 racf
5 1 CHESTNUT ROAD DOVETON 145. -38.0 racf
6 1 CHIVERS ROAD TEMPLESTOWE 145. -37.8 racf
7 1 CLAYTON ROAD BALWYN 145. -37.8 racf
8 1 COBB ROAD MOUNT ELIZA 145. -38.2 racf
9 1 ELEANOR STREET HEYFIELD 147. -38.0 racf
10 1 FOLEY STREET TERANG 143. -38.2 racf
# ℹ 805 more rows
#
# Edge Data: 4,692 × 8
from to weight long_hosp lat_hosp long_racf lat_racf category
<int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 628 1 145. -37.8 145. -37.7 RACF
2 1 629 218 145. -37.8 145. -37.7 RACF
3 1 630 2 145. -37.8 145. -37.7 RACF
# ℹ 4,689 more rows
The difference between these two methods is that for the as_tbl_graph() function, it only needs the edges dataset, which means that all the multivariate information will only be on edge data and in the node data, it will only have the name (location). For the tbl_graph() function, the node variable can be explicitly stated, which can come in handy when there are attributes on the node dataset.
For spatial networks, the sfnetworks package (van der Meer et al. 2024) extends tidygraph by allowing spatial geometries to be incorporated directly within the tbl_graph object. It is useful for dealing with complex geometry where edges are not straight-line connections, such as road or transport networks. The package also allows for the standard spatial operation within the sf package to be performed within the network context. For the temporal data structure provided by tsibble are not directly compatible with tidygraph objects. As a result, validating temporal consistency requires converting data back to a tsibble object or performing a temporal check prior to the creation of tbl_graph.
Data subsetting
Data subsetting is used to extract a subset of spatio-temporal network data based on spatial, temporal, and multivariate variables. This includes grouping data by time periods or regions, as well as filtering based on variable values and network characteristics (e.g., in-degree). In a network context, filtering operations need to account for topological dependencies between nodes and edges. When nodes are removed based on a condition, all edges incident to those nodes are also deleted (Figure 1). In contrast, when edges are removed, the nodes connected to those edges are preserved, since nodes can exist independently from an edge (Figure 2). The tidygraph supports these subsetting operations through the use of dplyr functions such as filter() and select(), which are applied separately on nodes and edges while maintaining the condition of the underlying network. Similarly to the data manipulation, users will need to switch between the node and edge tables to subset based on their attributes.
(a) Full network
(b) Node to be remove
(c) Filtered network
Figure 1: Node filtering
(a) Full network
(b) Edges to be remove
(c) Filtered network
Figure 2: Edge filtering
graph |>activate(nodes) |>filter(type =="racf")
# A tbl_graph: 627 nodes and 0 edges
#
# A rooted forest with 627 trees
#
# Node Data: 627 × 4 (active)
name longitude latitude type
<chr> <dbl> <dbl> <chr>
1 1 ABERDEEN STREET RESERVOIR 145. -37.7 racf
2 1 ADENEY STREET CAMPERDOWN 143. -38.2 racf
3 1 AITKEN AVENUE DONALD 143. -36.4 racf
4 1 CARINYA CRESCENT KORUMBURRA 146. -38.4 racf
5 1 CHESTNUT ROAD DOVETON 145. -38.0 racf
6 1 CHIVERS ROAD TEMPLESTOWE 145. -37.8 racf
7 1 CLAYTON ROAD BALWYN 145. -37.8 racf
8 1 COBB ROAD MOUNT ELIZA 145. -38.2 racf
9 1 ELEANOR STREET HEYFIELD 147. -38.0 racf
10 1 FOLEY STREET TERANG 143. -38.2 racf
# ℹ 617 more rows
#
# Edge Data: 0 × 8
# ℹ 8 variables: from <int>, to <int>, weight <dbl>, long_hosp <dbl>,
# lat_hosp <dbl>, long_racf <dbl>, lat_racf <dbl>, category <chr>
# A tbl_graph: 815 nodes and 277 edges
#
# A directed acyclic simple graph with 562 components
#
# Edge Data: 277 × 4 (active)
from to weight category
<int> <int> <dbl> <chr>
1 1 629 218 RACF
2 1 641 191 RACF
3 7 631 112 RACF
4 11 672 247 RACF
5 19 676 182 RACF
6 26 700 104 RACF
7 27 629 303 RACF
8 32 629 149 RACF
9 35 640 131 RACF
10 39 662 140 RACF
# ℹ 267 more rows
#
# Node Data: 815 × 4
name longitude latitude type
<chr> <dbl> <dbl> <chr>
1 1 ABERDEEN STREET RESERVOIR 145. -37.7 racf
2 1 ADENEY STREET CAMPERDOWN 143. -38.2 racf
3 1 AITKEN AVENUE DONALD 143. -36.4 racf
# ℹ 812 more rows
Data visualisation
Data visualisation helps reveal patterns, anomalies and relationships that may not be apparent from numerical summaries alone. Network data is often viewed as connections or flows between nodes/locations, and network-based visualisation allows for easier communication to a broader audience. For a simple network without spatial coordinates, placing nodes and edges in a visualisation requires the use of a graph layout algorithm, such as the Kamada-Kawai layout (Kamada and Kawai (1989)). Depending on the chosen algorithm, the positions of nodes and edges can be different even on the same network dataset. With spatial information, visualising these becomes more straightforward, as longitude and latitude can be used to specify the actual location of the nodes, with edges represented as lines connecting these locations.
simple_graph |>ggraph(x = long, y = lat) +geom_sf(data = vic_map, color ="white") +geom_edge_link(alpha =0.1) +geom_node_point(aes(color = category))
Figure 3: An ambulance transfers network in Victoria between residential aged care facilities and hospitals.
Visualising high-dimensional network data can be challenging, especially through a static visualisation alone. The current tool for network visualisation in R is the ggraph package (Pedersen 2024a), which extends the ggplot2 package (Wickham 2016) to support relational data structures such as networks, graphs, and trees. The ggraph package is effective at visualising static networks, offering a range of layout algorithms for placing the node locations while keeping the same familiar ggplot2 syntax. The support for interactive network visualisation with ggraph is currently limited. The reason static network visualisation is hard is that the amount of information that can be mapped to the visualisation is limited within a single figure. As shown in Figure 3, just a simple network representation can already become cluttered quickly. Answering detailed questions such as the number of transfers between a specific RACF and Hospital, or the name of a particular RACF, is difficult using static visualisation alone. Interactive visualisation help with these limitation by layering additional information onto the visualisation, allowing for further exploration.
interactive_vis_node <- simple_graph |>mutate(name =str_remove(name, "'")) |>ggraph(x = long, y = lat) +geom_sf(data = vic_map, color ="white") +geom_edge_link(alpha =0.1) +geom_point_interactive(aes(x = x, y = y,color = category,tooltip = name,data_id = name))girafe(ggobj = interactive_vis_node,options =list(opts_hover(css ="fill:lightblue;stroke:grey;stroke-width:0.5px"),opts_zoom(min =0.5, max =3) ))